\[~\]

The Final Project

\[~\]

Author \(\rightarrow\) Jeremy Sapienza 1960498
Statistics for Data Science and Laboratory II
Sapienza University of Rome
July 21th 2021

\[~\]

Guessing, if there is a Heart Attack or not

\[~\]

\[~\]

\[~\]

\[~\]

The Introduction

\[~\]

Nowadays, with the improvement of the technologies there are an amount of data that allows us to understand if there is any problem or not in the healthy of a person.

Here in this project we explain and we try to predict whether a patient should be diagnosed with Heart Disease or not.

\[~\]

The Dataset

\[~\]

\[~\]

Kaggle, is the main platform where I found this interesting dataset where applying our main bayesian inference as the main scope of this project.

This dataset is composed by these features:

## 
## -- Column specification --------------------------------------------------------
## cols(
##   age = col_double(),
##   sex = col_double(),
##   cp = col_double(),
##   trtbps = col_double(),
##   chol = col_double(),
##   fbs = col_double(),
##   restecg = col_double(),
##   thalachh = col_double(),
##   exng = col_double(),
##   oldpeak = col_double(),
##   slp = col_double(),
##   caa = col_double(),
##   thall = col_double(),
##   output = col_double()
## )
## # A tibble: 6 x 14
##     age   sex    cp trtbps  chol   fbs restecg thalachh  exng oldpeak   slp
##   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>    <dbl> <dbl>   <dbl> <dbl>
## 1    63     1     3    145   233     1       0      150     0     2.3     0
## 2    37     1     2    130   250     0       1      187     0     3.5     0
## 3    41     0     1    130   204     0       0      172     0     1.4     2
## 4    56     1     1    120   236     0       1      178     0     0.8     2
## 5    57     0     0    120   354     0       1      163     1     0.6     2
## 6    57     1     0    140   192     0       1      148     0     0.4     1
## # ... with 3 more variables: caa <dbl>, thall <dbl>, output <dbl>

\[~\]

Analyzing these features, we captured some interesting information for each feature within the dataset:

  1. Age -> it refers to the age of the patient
  2. sex -> it refers if the patient is:
    • male (1)
    • female (0)
  3. cp -> it refers to the chest pain type that could be of four types:
    • 0 is the typical angina
    • 1 is the atypical angina
    • 2 is the non-anginal pain
    • 3 is the asymptomatic type
  4. trestbps -> it refers to the resting blood pressure
  5. chol -> it refers to the serum cholesterol in mg/dl
  6. fbs -> it refers to the fasting blood sugar that if it is larger than 120 mg/dl is represented with:
    • 1 (true)
    • 0 (false)
  7. restecg -> it refers to the resting electrocardiographic results:
    • 0 as Normal
    • 1 having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    • 2 showing probable or definite left ventricular hypertrophy by Estes’ criteria
  8. thalachh -> it refers to the maximum heart rate achieved
  9. exng -> it refers to the exercise induced angina that could be:
    • 1 (yes)
    • 0 (no)
  10. oldpeak -> it refers to the previous peak
  11. slp -> it refers to the peak exercise ST segment:
    • 0 as up sloping
    • 1 as flat
    • 2 as down sloping
  12. caa -> it refers to the number of major vessels that goes from 0 to 3
  13. thall -> it refers to the diagnostic maximum heart rate achieved:
    • 0 as no-data
    • 1 as normal
    • 2 as fixed defect
    • 3 as reversible defect
  14. output -> it refers to the fact to have the heart attack or not

\[~\]

As we can see above, we have qualitative and quantitative data that will be well explained in the next subchapters.

The main features detected have these summaries:

\[~\]

##       age             sex               cp             trtbps     
##  Min.   :29.00   Min.   :0.0000   Min.   :0.0000   Min.   : 94.0  
##  1st Qu.:48.00   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:120.0  
##  Median :55.50   Median :1.0000   Median :1.0000   Median :130.0  
##  Mean   :54.42   Mean   :0.6821   Mean   :0.9636   Mean   :131.6  
##  3rd Qu.:61.00   3rd Qu.:1.0000   3rd Qu.:2.0000   3rd Qu.:140.0  
##  Max.   :77.00   Max.   :1.0000   Max.   :3.0000   Max.   :200.0  
##       chol            fbs           restecg          thalachh    
##  Min.   :126.0   Min.   :0.000   Min.   :0.0000   Min.   : 71.0  
##  1st Qu.:211.0   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:133.2  
##  Median :240.5   Median :0.000   Median :1.0000   Median :152.5  
##  Mean   :246.5   Mean   :0.149   Mean   :0.5265   Mean   :149.6  
##  3rd Qu.:274.8   3rd Qu.:0.000   3rd Qu.:1.0000   3rd Qu.:166.0  
##  Max.   :564.0   Max.   :1.000   Max.   :2.0000   Max.   :202.0  
##       exng           oldpeak           slp             caa        
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:1.000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.800   Median :1.000   Median :0.0000  
##  Mean   :0.3278   Mean   :1.043   Mean   :1.397   Mean   :0.7185  
##  3rd Qu.:1.0000   3rd Qu.:1.600   3rd Qu.:2.000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :6.200   Max.   :2.000   Max.   :4.0000  
##      thall           output     
##  Min.   :0.000   Min.   :0.000  
##  1st Qu.:2.000   1st Qu.:0.000  
##  Median :2.000   Median :1.000  
##  Mean   :2.315   Mean   :0.543  
##  3rd Qu.:3.000   3rd Qu.:1.000  
##  Max.   :3.000   Max.   :1.000

\[~\] As we can see above:

  1. The people captured in this dataset represents people between 29 and 77 years, so it is evident that there aren’t teenagers in this dataset. Because, there is a low probability that the teenagers have the heart attack.
  2. The genders in this dataset are equally distributed between females and males.
  3. The output feature is equally distributed between people that have a heart attack and not.

…Behaviours of the other features within the dataset could be seen in this summary, so let’s go to the data visualization for having a good visualizations of which data we are treating!

\[~\]

The Data Visualization

\[~\]

In this subsection we want to describe graphically the dataset that we are analyzing to have a first taste of what is the better model to use in this case, we want to show below some interesting plots.

\[~\]

The Visualization Using PCA

\[~\]

PCA is a particular (visualization, sometimes) tool that allows us to show the patients similar each other. Essentially, the principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest. Also PCA is used in exploratory data analysis and for making predictive models. It is commonly used for dimensionality reduction by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data’s variation as possible.

\[~\]

\[~\]

Its normal to denote that there could be some losses on this representation due to the amount of features that we want to consider, but here we actually recover 89.8% of variance in the entire dataset using two principal components, so this is a good preservation of the result.

\[~\]

The Categorical Data

\[~\]

Here, we want to analyze the percentages of each categorical data:

\[~\]

The most relevant is the typical angina (47.4%) that is defined as substernal chest pain precipitated by physical exertion or emotional stress and relieved with rest or nitroglycerin (represents a particular decrease of flow of the blood and oxygen to the heart, in order to relax the patient. This allows to pass well the blood within the body’s patient).

Women and elderly patients are usually have atypical symptoms both at rest and during stress, often in the setting of nonobstructive coronary artery disease (CAD).

\[~\]

the ST/heart rate slope (ST/HR slope), is proposed as a more accurate ECG criterion for diagnosing significant coronary artery disease (CAD).

The most relevant rate slope is approximately between the first and second type (as flat and as down sloping) these are qualitative data that show how it is the type of heart rate frequence.

\[~\]

The most case is 0 major vessels interested, this means that they aren’t effect of any damaged. In other little case we can see that few vessels are damaged (around 1-2 vessels).

\[~\]

the maximum rate achieved is about a fixed defect.

\[~\]

Angina may feel like pressure in the chest, jaw or arm. It frequently may occur with exercise or stress. Some people with angina also report feeling lightheaded, overly tired, short of breath or nauseated.

As the heart pumps harder to keep up with what you are doing, it needs more oxygen-rich blood. This reflects to the fact that we have no exercise induced by angina.

\[~\]

\[~\]

The Quantitative Data

\[~\]

Here, we illustrate the main features of the quantitative data after have seen the categorical data in the previous section:

\[~\]

hist.and.summary('age', 'Persons Age')
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   29.00   48.00   55.50   54.42   61.00   77.00
hist.and.summary('chol', 'Cholestoral')
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   126.0   211.0   240.5   246.5   274.8   564.0
hist.and.summary('thalachh', 'Maximum Heart Rate Achieved')
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    71.0   133.2   152.5   149.6   166.0   202.0
hist.and.summary('trtbps', 'Resting Blood Pressure')
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    94.0   120.0   130.0   131.6   140.0   200.0
hist.and.summary('oldpeak', 'Previous Peak')
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.800   1.043   1.600   6.200

\[~\]

As we can see above, the distributions of these data are different and several are more similar, for example if you compare the maximum heart rate and the age of the people these distributions are different! So, these considerations could be useful for predicting the heart attack.

After this, we consider also the densities considering the case of having the heart attack and not:

\[~\]

dense.chart('age', 'Persons Age')
dense.chart('chol', 'Cholestoral')
dense.chart('thalachh', 'Maximum Heart Rate Achieved')
dense.chart('trtbps', 'Resting Blood Pressure')
dense.chart('oldpeak', 'Previous Peak')

The Correlations

\[~\]

Here, we want to highlight the correlations between the features treated in qualitative, quantitative and without any distinction.

\[~\]

The correlations are measured considering the Pearson formula:

\[ \rho_{XY} = \frac{Cov(X,Y)}{\sigma_X \cdot \sigma_Y} \] Where:

  • \(Cov(X,Y)\) is the covariance between the two sets of values X and Y
  • \(\sigma_X\) is the deviation standard of the set X
  • \(\sigma_Y\) is the deviation standard of the set Y

With these correlation maps we see some interesting correlations is important to denote that the quantitative variables are more significant than the qualitative variables, because we are treating the quantitative measures, but here we can highlight and consider also the qualitative variables. So, here we denote in the complete map that the:

  • thalachh and age \(\rightarrow\) have -0.40 of correlations
  • cp and exng \(\rightarrow\) have -0.39 of correlations
  • thalachh and exng \(\rightarrow\) have -0.38 of correlations
  • slp and oldpeak \(\rightarrow\) have -0.58 of correlations

seems that these variables will be important in our predictions, we will see in few moment if it is in this way!

\[~\]

Preliminary Brief Definitions

\[~\]

\[~\]

The Goal (+ Cheating And Using The Frequentistic Logistic Regression Approach)

\[~\]

The main goal is to leverage the main fully Bayesian analysis, based on understanding if a person has a heart attack or not.

So, the response variable \(Y_i\) (that is the “output” feature in our dataset) is the heart attack \(\in \{0,1\}\) and the predictor variables (\(x_i \in \mathbb{R}^{+}\)) have been chosen using the glm() function (we are cheating, but in this way we can sure that we are catching the right variables), used to fit generalized linear models, as we can see below:

\[~\]

## 
## Call:
## glm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + 
##     x10 + x11 + x12 + x13, family = binomial(link = "logit"), 
##     data = dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4916  -0.3401   0.1393   0.6017   2.2757  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.92304    0.47505   4.048 5.16e-05 ***
## x1           0.11164    0.24650   0.453 0.650627    
## x2          -2.05362    0.53860  -3.813 0.000137 ***
## x3           1.03825    0.22840   4.546 5.47e-06 ***
## x4          -0.28381    0.19998  -1.419 0.155838    
## x5          -0.34242    0.22061  -1.552 0.120630    
## x6          -0.12348    0.58650  -0.211 0.833246    
## x7           0.08941    0.21458   0.417 0.676903    
## x8           0.51582    0.28428   1.814 0.069608 .  
## x9          -1.25304    0.46876  -2.673 0.007516 ** 
## x10         -0.59760    0.29303  -2.039 0.041409 *  
## x11          0.42288    0.24129   1.753 0.079680 .  
## x12         -0.63240    0.22594  -2.799 0.005126 ** 
## x13         -0.50013    0.20640  -2.423 0.015390 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 332.26  on 240  degrees of freedom
## Residual deviance: 162.55  on 227  degrees of freedom
## AIC: 190.55
## 
## Number of Fisher Scoring iterations: 6

\[~\]

As we can see above, the glm rejects the hypothesis to consider the variables:

… and also the intercept isn’t a good choice to admit it in our model.

An important remark to highlight is that we tried to do the frequentist approach of these data and we should remember that considering all of the features we achieved 238.35 as AIC value.

But, if we try with the features that we decide to consider in the next steps?

summary(glm(y ~ x2+x3+x4+x7+x9+x10+x12+x13, family = binomial(link = "logit"),data=dat))
## 
## Call:
## glm(formula = y ~ x2 + x3 + x4 + x7 + x9 + x10 + x12 + x13, family = binomial(link = "logit"), 
##     data = dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5353  -0.3953   0.1521   0.6160   2.0389  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.7178     0.4222   4.068 4.74e-05 ***
## x2           -1.6466     0.4593  -3.585 0.000337 ***
## x3            1.0493     0.2150   4.880 1.06e-06 ***
## x4           -0.2088     0.1869  -1.118 0.263759    
## x7            0.1888     0.2012   0.939 0.347968    
## x9           -1.5525     0.4402  -3.527 0.000421 ***
## x10          -1.0173     0.2608  -3.900 9.60e-05 ***
## x12          -0.6338     0.2006  -3.159 0.001583 ** 
## x13          -0.4613     0.1955  -2.360 0.018291 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 332.26  on 240  degrees of freedom
## Residual deviance: 172.46  on 232  degrees of freedom
## AIC: 190.46
## 
## Number of Fisher Scoring iterations: 6
dat_test <- dat_test[, -c(1,5,6,8,11)]

The behaviour is slightly changed, we have 234.19 as AIC value, we will see if we achieve the same result!

Then, in summary we have N = 302 observations and we considered these features after manually checks which are the best to consider, we adopted the technique of feature engineering. Then we decided to consider two models and see the difference on which model at the end is the best model of our analysis.

\[~\]

The Second Model

\[~\] Now, we want to focus to another model always the logistic regression with the whole features and a new link function, the cloglog function.

Why this changment? Because, we want to see which model is better… and how can you see this? I’ll tell you where you can easy see this better preference.

What is the link function? So, the link function transforms the probabilities of the levels of a categorical response variable to a continuous scale that is unbounded. Once the transformation is complete, the relationship between the predictors and the response can be modeled with the logistic regression for example.

As we can see we will reproduce this second model changing the link function to see if it is better or not than the previous model:

\[~\]

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 241
##    Unobserved stochastic nodes: 6
##    Total graph size: 2498
## 
## Initializing model
## Inference for Bugs model at "C:/Users/Jeremy/AppData/Local/Temp/RtmpaytLSw/model2b4340d6ca.txt", fit using jags,
##  3 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
##  n.sims = 2700 iterations saved
##          mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
## beta13    -0.392   0.081  -0.548  -0.446  -0.394  -0.339  -0.234 1.002  1300
## beta4     -0.090   0.054  -0.203  -0.125  -0.088  -0.052   0.006 1.001  2100
## beta5     -0.097   0.099  -0.293  -0.162  -0.095  -0.027   0.095 1.002  1900
## beta6     -0.545   0.253  -1.053  -0.708  -0.540  -0.371  -0.066 1.001  2700
## beta7      0.080   0.089  -0.092   0.022   0.083   0.140   0.256 1.001  2700
## beta8      0.658   0.104   0.459   0.588   0.656   0.725   0.866 1.001  2700
## deviance 271.940   3.476 267.149 269.351 271.244 273.822 280.295 1.001  2700
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 6.0 and DIC = 278.0
## DIC is an estimate of expected predictive error (lower deviance is better).

\[~\]

Can you see something different? Seems yes.. see the DIC! Is better the previous model (although the difference is too tiny, but the previous model is better).

…mmh, what is the DIC?

\[~\]

The Comparison Between These Two Models (DIC)

\[~\]

As we can see above, we tried to make a different model only changing the link function to see if it is better to associate with our data, the DIC (Deviance information criterion) is our indicator that says to us which is the better model.

The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. DIC is an asymptotic approximation as the sample size becomes large, like AIC.

the DIC is calculated as:

\[ DIC = p_D + \overline{D(\theta)} \]

where:

The larger the effective number of parameters is, the easier it is for the model to fit the data, and so the deviance needs to be penalized.

Lower is the DIC value better is the accuracy of the model, in this case is better the first model.

I decided to insert also other criterions similar to the DIC, we will see in details what they represent.

\[~\]

Akaike Information Criterion

\[~\]

The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

In estimating the amount of information lost by a model, AIC deals with the trade-off between the goodness of fit of the model and the simplicity of the model. In other words, AIC deals with both the risk of overfitting and the risk of underfitting.

The Akaike information criterion is defined as

\[ AIC(m) = D(\hat{\theta_m}, m) + 2 \cdot d_m \]

It is also a penalized deviance measure with penalty equal to 2 for each estimated parameter. \(d_m\) is the number of parameters considered in the model m.

In our case comparing the two models we have these values of AIC:

## The AIC for the FIRST model is: 199.699252881932
## The AIC for the SECOND model is: 283.939576789391

\[~\]

Bayesian Information Criterion

\[~\]

the Bayesian information criterion (BIC) is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).

When fitting models, it is possible to increase the likelihood by adding parameters, but doing so may result in overfitting. BIC introduces a penalty term for the number of parameters in the model; the penalty term is larger in BIC than in AIC.

The Bayesian information criterion is defined as

\[ BIC(m) = D(\hat{\theta_m}, m) + log(n) \cdot d_m \]

Where: - n is the number of observations considered - \(d_m\) is the number of coefficients considered. - \(D(\hat{\theta_m}, m)\) is the deviance of the model

In our case comparing the two models we have these values of AIC:

## The BIC for the FIRST model is: 231.062425283348
## The BIC for the SECOND model is: 304.848358390335

\[~\]

Comparing DIC, AIC, BIC

\[~\]

Here we considered all the criterions together:

##        DIC      AIC      BIC
## 1 191.3513 199.6993 231.0624
## 2 277.9843 283.9396 304.8484

As we can see, the values are pretty similar. They are only used as indicator to see how much is better the model with our data.

\[~\]

The Conclusions

\[~\] We try to plot the logistic regression using the parameters obtained in the first model, we compare the line for each parameter-response variable:

We prefer considering to show only the significant ones.

\[~\]

Parameters Recovery Simulations

\[~\] Now, we understood that the first model is better than the second. So, for checking that the model proposed can correctly recover the model parameters. I Executed the simulation considering the data simulated from the model proposed, the beta parameters checked are the estimated from the first model, considering these values we can continue with our last purpose:

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 302
##    Unobserved stochastic nodes: 9
##    Total graph size: 3417
## 
## Initializing model
## Inference for Bugs model at "C:/Users/Jeremy/AppData/Local/Temp/RtmpaytLSw/model2b467692d19.txt", fit using jags,
##  3 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
##  n.sims = 2700 iterations saved
##          mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
## beta0      1.847   0.338   1.206   1.614   1.844   2.072   2.529 1.001  2700
## beta10    -1.070   0.194  -1.468  -1.197  -1.059  -0.938  -0.699 1.000  2700
## beta12    -0.777   0.177  -1.130  -0.893  -0.772  -0.656  -0.447 1.001  2700
## beta13    -0.486   0.164  -0.806  -0.600  -0.486  -0.374  -0.173 1.001  2700
## beta2     -1.721   0.369  -2.424  -1.972  -1.714  -1.474  -1.015 1.001  2700
## beta3      1.192   0.180   0.846   1.070   1.187   1.308   1.549 1.002  2700
## beta4     -0.320   0.168  -0.651  -0.432  -0.318  -0.207   0.013 1.001  2700
## beta7      0.432   0.158   0.123   0.324   0.429   0.537   0.759 1.001  2700
## beta9     -1.555   0.345  -2.255  -1.783  -1.554  -1.323  -0.899 1.001  2700
## deviance 286.890   4.373 280.598 283.714 286.145 289.376 297.213 1.005   720
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 9.5 and DIC = 296.4
## DIC is an estimate of expected predictive error (lower deviance is better).

The values are slighlty the same, so the parameters are recovered partially well, as we can see above:

##              beta0    beta10     beta12     beta13     beta2    beta3
## Estimated 1.799712 -1.100802 -0.6779106 -0.4882562 -1.747258 1.119699
## True      1.846506 -1.069995 -0.7765428 -0.4860029 -1.721039 1.192456
##                beta4     beta7     beta9
## Estimated -0.2295450 0.1973873 -1.644643
## True      -0.3201653 0.4319876 -1.554910

\[~\]

… Final Conclusions

\[~\]

We saw the power and the weakness of Bayesian Analysis.

At first step we tried to face with the model on jags “cheating” a little bit on which independent variables are maybe good for our purposes and after that we make our usual considerations, then we compared with another model not so much good.. of course many little things could be improved in the future in this amazing analysis that I did in my opinion.

So thanks for all!

\[~\]

Further Work

\[~\]

Here I pointed different new improvements that could be done in a future time:

  1. Try to create different model, using different binary classificator like: SVM, K-Nearest Neighbours, Decision Tree etc..
  2. Oversampling the dataset to see other practical predictions
  3. More interesting plots to describe each subchapter
  4. Try to consider, after oversampling the dataset, a validation set to tune better the parameters
  5. Try to use other link function
  6. A Better feature engineering on the features dataset chosen

The References

\[~\]

  1. Heart Attack Analysis & Prediction Dataset - [Kaggle] https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset

  2. For graphical plots - [Kaggle] https://www.kaggle.com/chaitanya99/heart-attack-analysis-prediction/data

  3. Link functions, the model types - [Alteryx] https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Selecting-a-Logistic-Regression-Model-Type-Logit-Probit-or/ta-p/111269

  4. HighCharter, for the amazing plots in R - [High Charter] https://jkunst.com/highcharter/articles/hchart.html

  5. Coronary artery diasease - [PubMed] https://pubmed.ncbi.nlm.nih.gov/3739881

  6. Nitroglycerin - [Wikipedia] https://en.wikipedia.org/wiki/Nitroglycerin

  7. Vessel Diases - [Digirad] https://www.digirad.com/triple-vessel-disease-diagnosis-treatment/

  8. Different types of Angina - [CardioSmart] https://www.cardiosmart.org/topics/angina

  9. Logistic Regression Bayesian Model R - [BayesBall] https://bayesball.github.io/BOOK/bayesian-multiple-regression-and-logistic-models.html

  10. Typical Angina - [NCBI] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5680106

  11. RDocumentations - [RDocumention] https://www.rdocumentation.org

  12. Spectrum of ECG changes during exercise testing - [SomePoMed] https://somepomed.org/articulos/contents/mobipreview.htm?13/38/13930

\[~\]

\[~\]